Neural Network Foundations

This notebook is following the third lesson in the fast ai Practical Deep Learning for Coders course

The resources related are as following:

  1. Lesson 3 lecture

  2. Deep Learning for Coders with Fastai and PyTorch: AI Applications Without a PhD Chapter 4

  3. Course Notebooks:

    1. Which image models are best?
    2. How does a neural net really work?
# Suppress only UserWarning
import warnings

warnings.filterwarnings('ignore', category=UserWarning)

Detect if notebook is running on Kaggle

It’s a good idea to ensure you’re running the latest version of any libraries you need. !pip install -Uqq <libraries> upgrades to the latest version of

import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

if iskaggle:
    print('Is running on Kaggle.')
    !pip install -Uqq fastai

Choosing the best image model

timm

PyTorch Image Models (timm) is a wonderful library by Ross Wightman which provides state-of-the-art pre-trained computer vision models. It’s like Huggingface Transformers, but for computer vision instead of NLP (and it’s not restricted to transformers-based models)!

Ross has been kind enough to help me understand how to best take advantage of this library by identifying the top models. I’m going to share here so of what I’ve learned from him, plus some additional ideas.

The data

Ross regularly benchmarks new models as they are added to timm, and puts the results in a CSV in the project’s GitHub repo.

import pandas as pd

# Load the results data first
df_results = pd.read_csv('image_model_results/results-imagenet.csv')
df_results['merge_key'] = df_results['model'].str.split('.', n=1).str[0]

def get_data(col):
    # Load the benchmark data
    df_bench = pd.read_csv('image_model_results/benchmark-infer-amp-nhwc-pt240-cu124-rtx4090.csv')   
    df = df_bench.merge(df_results, left_on='model', right_on='merge_key', suffixes=('_bench', '_results'))    
    
    model_col_for_family = 'model_bench'
    df['secs'] = 1. / df[col] 
    # Extract family based on the benchmark model name
    df['family'] = df[model_col_for_family].str.extract('^([a-z]+?(?:v2)?)(?:\d|_|$)')
    # Filter out models ending in 'gn'
    df = df[~df[model_col_for_family].str.endswith('gn')]
    # Update family based on conditions (use the correct model column again)
    df.loc[df[model_col_for_family].str.contains('in22', na=False),'family'] = df.loc[df[model_col_for_family].str.contains('in22', na=False),'family'] + '_in22'
    df.loc[df[model_col_for_family].str.contains('resnet.*d', na=False),'family'] = df.loc[df[model_col_for_family].str.contains('resnet.*d', na=False),'family'] + 'd'

    # filter based on the 'family' column    
    if 'family' in df.columns and not df['family'].isnull().all():
         df_filtered = df[df['family'].str.contains('^re[sg]netd?|beit|convnext|levit|efficient|vit|vgg|swin', na=False)]
         return df_filtered
    else:
         print("Warning: 'family' column is missing or empty before final filtering.")
         return pd.DataFrame(columns=df.columns)

df = get_data('infer_samples_per_sec')

Inference results

Here’s the results for inference performance (see the last section for training performance). In this chart:

  • the x axis shows how many seconds it takes to process one image (note: it’s a log scale)
  • the y axis is the accuracy on Imagenet
  • the size of each bubble is proportional to the size of images used in testing
  • the color shows what “family” the architecture is from.

Hover your mouse over a marker to see details about the model. Double-click in the legend to display just one family. Single-click in the legend to show or hide a family.

Note: on my screen, Kaggle cuts off the family selector and some plotly functionality – to see the whole thing, collapse the table of contents on the right by clicking the little arrow to the right of “Contents”.

import plotly.express as px
w,h = 1000,800

def show_all(df, title, size):
    return px.scatter(df, width=w, height=h, size=df[size]**2, title=title,
        x='secs',  y='top1', log_x=True, color='family', hover_name='merge_key', hover_data=[size])
show_all(df, 'Inference', 'infer_img_size')

Fitting a function with gradient descent

I cant express how much I enjoyed this part. Jeremy explained it very beautifully and simply that reduces the barrier for everyone. He walks us step by step toward understanding the details of NN foundations with simple and intuitive examples and visualizations. He breaks down the scary keywords and jargons with his clear explanations.

Then he explains how a neural network approximates any given function and he shows the magic of multiple ReLU.

The most fastinating part of Jeremy’s lecture was the part that he showed he he made NN in Excel. I know Excel freaks love it but I think it shows how Jeremy mastered the concept that he can explain it so beautifully and make it memorable for everyone.

Finally, I found Jeremy’s response to those who questioned him about what happens next after learning the basics and foundation of deep learning, importan and I bring it here completely.

How to recognise an owl

OK great, we’ve created a nifty little example showing that we can drawing squiggly lines that go through some points. So what?

Well… the truth is that actually drawing squiggly lines (or planes, or high-dimensional hyperplanes…) through some points is literally all that deep learning does! If your data points are, say, the RGB values of pixels in photos of owls, then you can create an owl-recogniser model by following the exact steps above.

Students often ask me at this point “OK Jeremy, but how do neural nets actually work”. But at a foundational level, there is no “step 2”. We’re done – the above steps will, given enough time and enough data, create (for example) an owl recogniser, if you feed in enough owls (and non-owls).

The devil, I guess, is in the “given enough time and enough data” part of the above sentence. There’s a lot of tweaks we can make to reduce both of these things. For instance, instead of running our calculations on a normal CPU, as we’ve done above, we could do thousands of them simultaneously by taking advantage of a GPU. We could greatly reduce the amount of computation and data needed by using a convolution instead of a matrix multiplication, which basically means skipping over a bunch of the multiplications and additions for bits that you’d guess won’t be important. We could make things much faster if, instead of starting with random parameters, we start with parameters of someone else’s model that does something similar to what we want (this is called transfer learning).